-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: Assorted DatetimeIndex bugfixes #24157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hello @jbrockmendel! Thanks for submitting the PR.
|
Codecov Report
@@ Coverage Diff @@
## master #24157 +/- ##
===========================================
- Coverage 92.2% 43.02% -49.18%
===========================================
Files 162 162
Lines 51700 51714 +14
===========================================
- Hits 47670 22251 -25419
- Misses 4030 29463 +25433
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #24157 +/- ##
==========================================
- Coverage 92.22% 92.22% -0.01%
==========================================
Files 162 162
Lines 51785 51796 +11
==========================================
+ Hits 47758 47767 +9
- Misses 4027 4029 +2
Continue to review full report at Codecov.
|
Azure fail is in developer.rst, unrelated. |
@datapythonista is the azure fail caused by something in this PR? It looks unrelated but re-pushing didn’t fix it. |
master was broken (I merged an old PR that didn't have some linting), but it's been fixed in #24160 If you merge master and push again, the CI should pass now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls add to #6581 for the deprecation
pandas/core/arrays/datetimelike.py
Outdated
@@ -327,6 +327,10 @@ def __getitem__(self, key): | |||
"numpy.newaxis (`None`) and integer or boolean " | |||
"arrays are valid indices") | |||
|
|||
if key is Ellipsis: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you make a more explicit comment. This is likely a more general fix (IOW this is likely broken for all __getitem__
on EA)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why don't this just work as is? why is this not a view (and not a copy)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why don't this just work as is?
as-is DatetimeIndex[...]
loses its freq
attribute
why is this not a view (and not a copy)?
Since the copy doesn't have deep=True
, self.copy()
effectively a view?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you be more explicit in your comment, this is not obvious
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved to a hopefully clearer spot a few lines down
ci = pd.CategoricalIndex(dti) | ||
carr = pd.Categorical(dti) | ||
cser = pd.Series(ci) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you parametrize
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will take a look
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not without adding casting code that makes the verbosity a wash.
Once Categorical(DatetimeArray) works (currently raises bc DatetimeArray doesn't have _constructor) then this test can be parametrized over DatetimeIndex/DatetimeArray
Will do. Actually, is this private enough that a deprecation cycle is unnecessary? If so we should just remove the arg now and move the whole func to arrays.datetimes, since that is the only place it is used |
@TomAugspurger does this step on your toes? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
I think it'll apply cleanly aside from imports.
pandas/core/arrays/datetimelike.py
Outdated
@@ -327,6 +327,10 @@ def __getitem__(self, key): | |||
"numpy.newaxis (`None`) and integer or boolean " | |||
"arrays are valid indices") | |||
|
|||
if key is Ellipsis: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why don't this just work as is? why is this not a view (and not a copy)?
pandas/core/arrays/datetimelike.py
Outdated
@@ -327,6 +327,10 @@ def __getitem__(self, key): | |||
"numpy.newaxis (`None`) and integer or boolean " | |||
"arrays are valid indices") | |||
|
|||
if key is Ellipsis: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you be more explicit in your comment, this is not obvious
pandas/tseries/offsets.py
Outdated
@@ -2487,6 +2488,9 @@ def generate_range(start=None, end=None, periods=None, | |||
|
|||
""" | |||
if time_rule is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did you need to change any existing tests to account for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, tests.indexes.datetimes.test_date_range test_generate and test_generate_cday
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this sufficiently internal that we don't need to do a deprecation cycle?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, let's just drop it entirely (fix the tests). I don't think generate_range
is exposed, though maybe add in a note in api breaking changes just to cover.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do. OK with moving this to arrays.datetimes while we're at it?
doc/source/whatsnew/v0.24.0.rst
Outdated
@@ -1133,7 +1133,7 @@ Deprecations | |||
- Passing a string alias like ``'datetime64[ns, UTC]'`` as the `unit` parameter to :class:`DatetimeTZDtype` is deprecated. Use :class:`DatetimeTZDtype.construct_from_string` instead (:issue:`23990`). | |||
- In :meth:`Series.where` with Categorical data, providing an ``other`` that is not present in the categories is deprecated. Convert the categorical to a different dtype or add the ``other`` to the categories first (:issue:`24077`). | |||
- :meth:`Series.clip_lower`, :meth:`Series.clip_upper`, :meth:`DataFrame.clip_lower` and :meth:`DataFrame.clip_upper` are deprecated and will be removed in a future version. Use ``Series.clip(lower=threshold)``, ``Series.clip(upper=threshold)`` and the equivalent ``DataFrame`` methods (:issue:`24203`) | |||
|
|||
- Passing a `time_rule` to `pandas.tseries.offsets.generate_range` is deprecated and will raise a ``TypeError`` in a future version. Pass an ``offset`` instead (:issue:`24157`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these don't jive with what's at the top of the PR. missing 2 notes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about #11587 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it merit a note? All that is changed is an error message. But yes, this does close 11587, I'll update this above.
pandas/core/arrays/datetimelike.py
Outdated
# non-fixed frequencies are not meaningful for timedelta64; | ||
# we retain that error message | ||
raise e | ||
# GH#11587 if index[0] is NaT _generate_range will raise |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is too cryptic, pls make more readable if you don't actually know what's happening here
@@ -1652,6 +1654,13 @@ def maybe_convert_dtype(data, copy): | |||
raise TypeError("Passing PeriodDtype data is invalid. " | |||
"Use `data.to_timestamp()` instead") | |||
|
|||
elif is_categorical_dtype(data): | |||
# GH#18664 preserve tz in going DTI->Categorical->DTI | |||
# TODO: cases where we need to do another pass through this func, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is the TODO case here, what is an example? do you have an xfailed test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
example is Categorical(TimedeltaIndex) where we should do another pass through maybe_convert_dtype in order to issue the FutureWarning. I'm assuming there are other corner cases here that will need to be caught/tested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small comments, ping on green.
@@ -380,6 +380,7 @@ Backwards incompatible API changes | |||
- ``max_rows`` and ``max_cols`` parameters removed from :class:`HTMLFormatter` since truncation is handled by :class:`DataFrameFormatter` (:issue:`23818`) | |||
- :meth:`read_csv` will now raise a ``ValueError`` if a column with missing values is declared as having dtype ``bool`` (:issue:`20591`) | |||
- The column order of the resultant :class:`DataFrame` from :meth:`MultiIndex.to_frame` is now guaranteed to match the :attr:`MultiIndex.names` order. (:issue:`22420`) | |||
- :func:`pd.offsets.generate_range` argument ``time_rule`` has been removed; use ``offset`` instead (:issue:`24157`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this probably won't render as its not in the api.rst but ok
@@ -1310,6 +1310,8 @@ Datetimelike | |||
- Bug in :class:`Index` where calling ``np.array(dtindex, dtype=object)`` on a timezone-naive :class:`DatetimeIndex` would return an array of ``datetime`` objects instead of :class:`Timestamp` objects, potentially losing nanosecond portions of the timestamps (:issue:`23524`) | |||
- Bug in :class:`Categorical.__setitem__` not allowing setting with another ``Categorical`` when both are undordered and have the same categories, but in a different order (:issue:`24142`) | |||
- Bug in :func:`date_range` where using dates with millisecond resolution or higher could return incorrect values or the wrong number of values in the index (:issue:`24110`) | |||
- Bug in :class:`DatetimeIndex` where constructing a :class:`DatetimeIndex` from a :class:`Categorical` or :class:`CategoricalIndex` would incorrectly drop timezone information (:issue:`18664`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where is the note for #11587?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just pushed, should be just below this line.
pandas/tseries/offsets.py
Outdated
@@ -2,6 +2,7 @@ | |||
from datetime import date, datetime, timedelta | |||
import functools | |||
import operator | |||
import warnings |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prob going to fail the build as this is not needed
linting issue. ping on green. |
Ping |
thanks! |
Bit of a hodge-podge.
time_rule
inoffsets.generate_range
@TomAugspurger nothing here is urgent, so if this overlaps with DTA it can stay on the backburner.